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ABSTRACT 



Background: Celiac disease (CD) is a disorder associated with body reaction to gluten. After 
the gluten intake, an immune reaction against the protein occurs and damages villi of small 
intestine in celiac patients gradually. 

Objectives: The OSC, a filtering method for minimization of inter- and intra-spectrometer vari- 
ations that influence on data acquisition, was applied to biofluid NMR data of CD patients. 
Patients and Methods: In this study, metabolites of total 56 serum samples from 12 CD pa- 
tients, 15 CD patients taking gluten-free diet (GFD), and 29 healthy cases were analyzed using 
nuclear magnetic resonance (NMR) and associated theoretical analysis. Employing ProMetab 
(version ProMetab_v3_3) software, data obtained from NMR spectra were reduced and orthog- 
onal signal correction (OSC) effect on celiac disease metabonomics before and after the separa- 
tion by principle component analysis (PCA) was investigated. 

Results: The three groups were separated by OSC and findings were analyzed by partial least 
squares discriminant analysis (PLS-DA) method. Root mean square error of calibration (RM- 
SEc) and correlation coefficient of calibration (Rc) for PLS-DA referred to an efficient group 
separation filtered by OSC. 

Conclusions: The applied leave-one-out cross-validation to PLS-DA method performed along 

with OSC confirmed validation of data analysis. Finally four metabolites are introduced as CD 

biomarkers. ^ .„ . , 
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► Implication for health policy/practice/research/medical education: 

*H NMR-Based Metabonomic showed that four metabolites were introduced to differentiate between celiac patients on GFD, celiac 
patients without specific diet, and healthy people. The results of present study could help someone who desire to focus on the In- 
vestigation of complex metabolic systems such as disease mechanisms, toxic reactions, and genetic manipulations. 
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1. Background 

Celiac desease (CD) is a disorder caused essentially 
by body reaction to the gluten. Intake of foods contain- 
ing gluten promotes an immune reaction against the 
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protein that damages villi of small intestine in celiac 
patients gradually. Consequently, lack of vitamins, min- 
erals, and other essential nutrients occurs. Therefore, ce- 
liac patients are at the risk of malnutrition, anemia and 
osteoporosis from which the anemia attributes to iron 
deficiency and results in declining red blood cell efficien- 
cy and the osteoporosis may represent as fragile bones 
caused by lack of calcium (l). 

Full analysis of a living organism can be achieved by an 
integrated set of 'omics' approaches including metabo- 
nomics, genomics, transcriptomics, and proteomics in 
order to increase data complexity. Metabonomics gen- 
erates sufficient quantitative or qualitative metabolic 
data for analytical studies of biological systems (2-7) and 
is originally defined as 'the quantitative measurement 
of the dynamic multi-parametric metabolic response of 
living systems to pathophysiological stimuli or genetic 
modification'(5). 

Investigation of complex metabolic systems such as dis- 
ease mechanisms, toxic reactions, and genetic manipula- 
tions requires full analytical data sets. Nuclear magnetic 
resonance (NMR) and mass spectrometry (MS) are the most 
conventional techniques for metabolic profiling (8-10). 

Multivariate statistical approaches are also suitable 
methods for extraction metabolic data associated with 
each level of dynamic processes. Since NMR method 
produces various and highly correlated data, employing 
multivariate methods such as PCA prior to any further 
data analysis is essential to choose efficient number of 
descriptors (11). 

2. Objectives 

In this study OSC, a filtering method for minimization 
of inter- and intra-spectrometer variations that influence 
on data acquisition, was applied to biofluid NMR data of 
CD patients. 

3. Patients and Methods 

3. 1. Blood Samples 

Using syringes, two milliliters of blood samples were 
drawn from antecubital vein of each case through a sin- 
gle puncture followed by immediate serum separation 
via centrifugation, and storage at -20°C. Experimental 
data was obtained by placing serum samples in 5mm 
NMR tubes. 600 jjL serum samples each were diluted by 
100 \iL D 2 0 to provide a field-frequency lock. 

The experimental data consisted of 56 serum samples 
including 12 samples from celiac patients on gluten-free 
diet, 15 samples from patients without a specific diet, and 
29 samples from healthy cases. 

3. 2. 1 HNMR Spectroscopy 

A BrukerAvance DRX 600 spectrometer operating at 
500 MHz at 300 K using the Carr-Purcell-Meiboom-Gill 



(CPMG) spin-echo sequence (12) with pre saturation was 
employed for recording *H NMR spectra. QNP probe was 
used in this experiment. 

Spin-echo loop time (2nt) and relaxation delay for re- 
corded spectra were 43.9 ms and 2s, respectively. A total 
of 128 transients were collected into 32k data points using 
a spectral width of 8389.26 Hz and an acquisition time of 
1.95 s. Prior to Fourier transformation, an exponential 
line broadening function of 0.30 Hz was applied to the 
Free Induction Decay (FID). 

3.3. Data Pre-Processing 

Employing Prometab software (version Prometab_ 
v3_3), correction of baseline was carried out according to 
4, 4-Dimethyl-4-Silapentane-l-Sulfonic acid (DSS) as refer- 
ence. The number of variables was reduced to 245 in each 
spectrum by integrating spectral intensity in regions of 
equal width [0.04 parts per million (ppm) over the range 
0.2-10.0 ppm]. In addition, the region of (S = 4.50-5.98) 
was excluded from the analysis to avoid unauthentic ef- 
fects of variability in the suppression of water resonance. 
Therefore, the number of variables was reduced to 205. 
All spectra were normalized to constant total intensity. 

3.4. Statistical Analysis 

3.4.1. OSC for Classification Modeling 

A response matrix (Y) describing variation between de- 
fined sample classes was constructed and the variation 
in X orthogonal to Y was removed by applying OSC. The 
filtered data matrix, X osc , that included class-correlated 
variations was modeled by subsequent multivariate 
modeling methods such as PCA or PLS-DA(14). OSC Mod- 
eling eliminates variables that are not correlated with 
desirable features from J H NMR biofluid data; as a result 
calculated values in multivariate models motivate only 
class separation. In this study, OSC was applied to biofluid 
NMR data prior to chemometric analysis to minimize in- 
fluences of inter- and intra-spectrometer variations dur- 
ing data acquisition. Moreover, it was employed to elimi- 
nate physiological variation from data sets (13-17). 

3.4.2. Principal Components Analysis (PCA) 

PCA is a conventional technique for multivariate analy- 
sis and mainly employed for multivariate data represen- 
tation in a low-dimensional space that means it describes 
maximum possible number of variables with minimum 
possible number of principal components (PCs). In PCA, 
principal components are named consecutively starting 
from PC 1 until total variance is defined. PC 1 or first prin- 
cipal component is a line that goes through the points 
in a variable space and best conserves relevant distances 
between objects and defined by a loading vector as fol- 
lows. PCj is a hidden variable and has maximum variance 
of the scores. Scores are predicted data values on the hid- 
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den variable (18). 

All calculations were performed in MATLAB 6.5 and PCA 
was implemented with the PLS-Toolbox Version 3.0. Both 
the graphical ('pcagui') and the command-line ('pea') ver- 
sions of PCA have been employed (19). 

3.4.3. Partial least Squares Discriminant Analysis (PIS- 
DA) 

PLS regression was directed by a response data set Y to 
derive components from descriptor data set X that best 
describe specified Y structure, as it maximizes the cova- 
riance that expresses common structure between X and 
Y(20-22). PLS is also divided into regression as well as dis- 
criminant analysis (PLS-DA). Classification by DA assigns 
the samples to proper separate classes which are repre- 
sented by using so-called 'dummy' variables (23). 

4. Results 

As shown in Figure la, data analysis leads to introduce 
two groups of control and celiac cases. Since separation 
of GDF group from other two groups was an important 
index in this study, more analytical efforts were effected 
by using OSC in order to achieve better transparency be- 
tween sample groups. The useful role of OSC in separa- 
tion groups is reported in several studies (24). By using 
OSC, 95.26% total variability was distributed between 
three PCs as pq (91.11%), PC 2 (2.88%), and PC 3 (1.47%) (See 
Figure lb). After successful division of samples into three 
groups, PLS-DA as regression extension of PCA was ap- 
plied to maximize the separation. It is reported that 
PLS-DA is useful tool for maximizing covariance between 
measured data (x) and response variable (y) (25). PLS-DA 
findings introduced three most influential metabolites 
that play important role in separation of groups. Serum 
level alteration of these metabolites in CD group com- 
pared to control group is shown in Table 1. 

The percentage of captured variance by PCA model is a 
suitable index for data validation (26); accordingly, three 
times PC s and their associated variance amounts were cal- 
culated and shown in Table 2 with total variance equal to 
92.22% out of which the percentage variance in step 1 was 
equal to 84.38%, and in second and third steps were 4.74 
and 3.12, respectively. Since 92.24% is close to 1, we may 
conclude that PCA model could separate the three groups 
in a real and valid manner. 

Score plots of PLS-DA without and with application of 
OSC for three groups are shown in Figure 2. The useful 



Table 2. Percent Variances Captured by PCA Model. First column corre- 
sponds to steps of model application and the second and third columns 
refer to variance of each step and total variance, respectively. 



Number 


% Variance 


% Variance Total 


1 


84.38 


84.38 


2 


4.74 


89.12 


3 


3.12 


92.24 



Table 3. Summary of PLS-DA and OSC- PLS-DA Models of NMR Spectra. LVs refer 
to number of latent variances and Vx and Vy correspond to NMR data matrix 
and respond matrix, respectively Vx (cuml, %) and Vy show total variances. 



LVs 


v x (%) 


V x (cuml,%) V Y (%) 


V Y (cuml,%) 


PLS DA Model 


1 


63.96 


63.96 44.07 


44.07 


2 


9.05 


73.01 16.34 


60.41 


3 


2.54 


75.55 8.24 


66.73 


OSC- PLS DA Model 


1 


75.25 


75.25 100.00 


100.00 


2 


6.11 


81.36 0.00 


100.00 


3 


5.46 


86.82 0.00 


100.00 
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Figure 1. Score Plot PCA a) Without OSC, and b) With OSC 



Table 1. PCA-Detected 1H NMR Spectral Regions that Separate Significantly CD and Control Groups Based on Metabolites Levels 



Chemical Shift (ppm) 


Metabolite 


Alteration of Metabolite Level in CD Group Compared to Control Group 


0.86 


lipid (mainly VLDL) 


Decrease 


1.26-1.3 


Lipid 


Decrease 


1.34 


Lactate 


Decrease 


3.22 


Choline 


Increase 
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Figure 2. Score Plot PLS-DA a) Without OSC, and b) With OSC 



5. Discussion 

According to orthogonal theory in mathematics, OSC 
can eliminate data in X matrix that are orthogonal with 
response Y matrix. The response Y matrix is class variable. 
The Y matrix variables determined as control, CD, and 
GFD groups were assigned to as 0 , 1, and 2, respectively. 
Orthogonal component with eigenvalues greater than 
1 were eliminated. In order to perform metabonomics 
analysis of recorded ^ NMR spectra of healthy, celiac, and 
GFD groups, two different recognition pattern methods 
before and after OSC were applied. Score plots of PCA are 
drawn using the MATLAB software. Figure la and Figure lb 
demonstrate PCA score plots of NMR spectra without and 
with OSC, respectively. 

On the basis of this study, four metabolites were intro- 
duced to differentiate between celiac patients on GFD, ce- 
liac patients without specific diet, and healthy people. We 
hope that further investigations lead to determine exact 
metabonomic pattern of CD. 
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parameter to analyze PLS-DA findings is latent variable 
variance (LV) (16). LVs of represented data in Figure 2 for 
three steps of PLS-DA were calculated and shown in Table 
3. There are several evidences implicate that total vari- 
ance value above 80% is an acceptable index for data vali- 
dation. Accumulated PLS-DA score plot by using OSC for 
three LV variances was 86.82% (see Table 3) that revealed a 
clear separation between the three groups; however this 
procedure without applOSC application (identified by an 
accumulated score plot of 75.55%) could not be consid- 
ered as a suitable analytical method. 

RMSEc and Rc values for PLS-DA calibration without ap- 
plying OSC were equal to 0.6682 and 0.6348, respectively. 
By using OSC, these values changed to 0.4226 and 0.8629, 
respectively. Amounts of Rc close to 1 correspond to ac- 
ceptable data analysis (27) that means the calculated Rc 
value of 0.8629 refers to application of a proper analyti- 
cal method for group separation. Besides this, OSC appli- 
cation reduced RMSEc value, that means an error reduc- 
tion occurred. For better clarity, the applied OSC model 
was performed accompanied by leave-one-out cross-vali- 
dation. The characterized findings were 0.5175 and 0.8510 
for RMSEcv and Rev, respectively. The low amount of error 
and Rev close to 1, confirm sufficient validity of findings. 
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